11 research outputs found

    Novel scaling law governing stock price dynamics

    Full text link
    A stock market is typically modeled as a complex system where the purchase, holding or selling of individual stocks affects other stocks in nonlinear and collaborative ways that cannot be always captured using succinct models. Such complexity arises due to several latent and confounding factors, such as variations in decision making because of incomplete information, and differing short/long-term objectives of traders. While few emergent phenomena such as seasonality and fractal behaviors in individual stock price data have been reported, universal scaling laws that apply collectively to the market are rare. In this paper, we consider the market-mode adjusted pairwise correlations of returns over different time scales (τ\tau), ci,j(τ)c_{i,j}(\tau), and discover two such novel emergent phenomena: (i) the standard deviation of the ci,j(τ)c_{i,j}(\tau)'s scales as τ−λ\tau^{-\lambda}, for τ\tau larger than a certain return horizon, τ0\tau_0, where λ\lambda is the scaling exponent, (ii) moreover, the scaled and zero-shifted distributions of the ci,j(τ)c_{i,j}(\tau)'s are invariant of τ>τ0\tau > \tau_0. Our analysis of S\&P500 market data collected over almost 2020 years (2004−20202004-2020) demonstrates that the twin scaling property holds for each year and across 22 decades (orders of magnitude) of τ\tau. Moreover, we find that the scaling exponent λ\lambda provides a summary view of market volatility: in years marked by unprecedented financial crises -- for example 20082008 and 20202020 -- values of λ\lambda are substantially higher. As for analytical modeling, we demonstrate that such scaling behaviors observed in data cannot be explained by existing theoretical frameworks such as the single- and multi-factor models. To close this gap, we introduce a promising agent-based model -- inspired by literature on swarming -- that displays more of the emergent behaviors exhibited by the real market data.Comment: 45 page

    Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media

    Full text link
    Social media is a breeding ground for threat narratives and related conspiracy theories. In these, an outside group threatens the integrity of an inside group, leading to the emergence of sharply defined group identities: Insiders -- agents with whom the authors identify and Outsiders -- agents who threaten the insiders. Inferring the members of these groups constitutes a challenging new NLP task: (i) Information is distributed over many poorly-constructed posts; (ii) Threats and threat agents are highly contextual, with the same post potentially having multiple agents assigned to membership in either group; (iii) An agent's identity is often implicit and transitive; and (iv) Phrases used to imply Outsider status often do not follow common negative sentiment patterns. To address these challenges, we define a novel Insider-Outsider classification task. Because we are not aware of any appropriate existing datasets or attendant models, we introduce a labeled dataset (CT5K) and design a model (NP2IO) to address this task. NP2IO leverages pretrained language modeling to classify Insiders and Outsiders. NP2IO is shown to be robust, generalizing to noun phrases not seen during training, and exceeding the performance of non-trivial baseline models by 20%20\%.Comment: ACL 2022: 60th Annual Meeting of the Association for Computational Linguistics 8+4 pages, 6 figure

    Embed-Search-Align: DNA Sequence Alignment using Transformer Models

    Full text link
    DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics. Conventional methods, refined over decades, tackle this challenge in two steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models (LLM) in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have explored whether the same Transformer architecture can produce numerical representations for DNA sequences. Such models have shown early promise in tasks involving classification of short DNA sequences, such as the detection of coding vs non-coding regions, as well as the identification of enhancer and promoter sequences. Performance at sequence classification tasks does not, however, translate to sequence alignment, where it is necessary to conduct a genome-wide search to successfully align every read. We address this open problem by framing it as an Embed-Search-Align task. In this framework, a novel encoder model DNA-ESA generates representations of reads and fragments of the reference, which are projected into a shared vector space where the read-fragment distance is used as surrogate for alignment. In particular, DNA-ESA introduces: (1) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich sequence-level embeddings, and (2) a DNA vector store to enable search across fragments on a global scale. DNA-ESA is >97% accurate when aligning 250-length reads onto a human reference genome of 3 gigabases (single-haploid), far exceeds the performance of 6 recent DNA-Transformer model baselines and shows task transfer across chromosomes and species.Comment: 17 pages, Tables 5, Figures 5, Under review, ICL

    An Automated Pipeline for Character and Relationship Extraction from Readers' Literary Book Reviews on Goodreads.com

    Full text link
    Reader reviews of literary fiction on social media, especially those in persistent, dedicated forums, create and are in turn driven by underlying narrative frameworks. In their comments about a novel, readers generally include only a subset of characters and their relationships, thus offering a limited perspective on that work. Yet in aggregate, these reviews capture an underlying narrative framework comprised of different actants (people, places, things), their roles, and interactions that we label the "consensus narrative framework". We represent this framework in the form of an actant-relationship story graph. Extracting this graph is a challenging computational problem, which we pose as a latent graphical model estimation problem. Posts and reviews are viewed as samples of sub graphs/networks of the hidden narrative framework. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative Machine Learning (ML) model where nodes represent actants, and multi-edges and self-loops among nodes capture context-specific relationships. We develop a pipeline of interlocking automated methods to extract key actants and their relationships, and apply it to thousands of reviews and comments posted on Goodreads.com. We manually derive the ground truth narrative framework from SparkNotes, and then use word embedding tools to compare relationships in ground truth networks with our extracted networks. We find that our automated methodology generates highly accurate consensus narrative frameworks: for our four target novels, with approximately 2900 reviews per novel, we report average coverage/recall of important relationships of > 80% and an average edge detection rate of >89\%. These extracted narrative frameworks can generate insight into how people (or classes of people) read and how they recount what they have read to others

    Mapping dreams in a computational space

    Get PDF
    This article demonstrates that an automated system of linguistic analysis can be developed – the Oneirograph – to analyze large collections of dreams and computationally map their contents in terms of typical situations involving an interplay of characters, activities, and settings. Focusing the analysis first on the twin situations of fighting and fleeing, the results provide densely detailed empirical evidence of the underlying semantic structures of typical dreams. The results also indicate that the Oneirograph analytic system can be applied to other typical dream situations as well (e.g., flying, falling), each of which can be computationally mapped in terms of a distinctive constellation of characters, activities, and settings

    Modelling social readers: novel tools for addressing reception from online book reviews

    No full text
    Social reading sites offer an opportunity to capture a segment of readers' responses to literature, while data-driven analysis of these responses can provide new critical insight into how people 'read'. Posts discussing an individual book on the social reading site, Goodreads, are referred to as 'reviews', and consist of summaries, opinions, quotes or some mixture of these. Computationally modelling these reviews allows one to discover the non-professional discussion space about a work, including an aggregated summary of the work's plot, an implicit sequencing of various subplots and readers' impressions of main characters. We develop a pipeline of interlocking computational tools to extract a representation of this reader-generated shared narrative model. Using a corpus of reviews of five popular novels, we discover readers' distillation of the novels' main storylines and their sequencing, as well as the readers' varying impressions of characters in the novel. In so doing, we make three important contributions to the study of infinite-vocabulary networks: (i) an automatically derived narrative network that includes meta-actants; (ii) a sequencing algorithm, REV2SEQ, that generates a consensus sequence of events based on partial trajectories aggregated from reviews, and (iii) an 'impressions' algorithm, SENT2IMP, that provides multi-modal insight into readers' opinions of characters
    corecore